Pattern Matching Machine for Text Compressed Using Finite State Model

نویسندگان

  • Masayuki Takeda
  • M. Takeda
چکیده

The classical pattern matching problem is to nd all occurrences of patterns in a text. In many practical cases, since the text is very large and stored in the secondary storage, most of the time for the pattern matching is dominated by data transmission of the text. Therefore the text compression can speed-up the pattern matching. In this framework it is required to develop an e cient pattern matching algorithm for searching the compressed text directly without decoding. In 1992, Fukamachi et al. proposed a method of constructing pattern matching machine that runs on Hu man coded text, based on the Aho-Coracick algorithm. However, since the Hu man code is optimal only under the assumption of the memoryless source model, the compression ratio is not very high. On the other hand, it is known that English text can be highly compressed by the compression method based on the Markov model. In this paper, we focus our attention on the nite-state model, which subsumes the Markov model as an important special case, and show an algorithm for constructing pattern matching machine for text compressed under the assumption of this model. We also give a proof of the correctness of the algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern - Matching Problems for

The power of weighted nite automata to describe very complex images was widely studied, see [5, 6, 7]. Finite automata can be also used as an e ective tool for compression of twodimensional images. There are some software packages using this type of compression, see [12, 6]. We consider the complexity of some pattern-matching problems for two-dimensional images which are highly compressed using...

متن کامل

Speed-up of Aho-Corasick Pattern Matching Machines by Rearranging States

This paper describes speed-up of string pattern matching by rearranging states in Aho-Corasick pattern matching machine, which is a kind of afinite automaton. We realized speed-up of string pattern matching using data compression. Although we obtain higher compression ratio using a finite state model, it doesn’t lead speed-up of string pattern matching. Because the pattern matching machine beco...

متن کامل

Optimal pattern matching algorithms

We study a class of finite state machines, called w-matching machines, which yield to simulate the behavior of pattern matching algorithms while searching for a pattern w. They can be used to compute the asymptotic speed, i.e. the limit of the expected ratio of the number of text accesses to the length of the text, of algorithms while parsing an iid text to find the pattern w. Defining the orde...

متن کامل

Linear Pattern Matching with Swaps for Short Patterns

The Pattern Matching problem with swaps is a variation of the classical pattern matching problem. It consists of finding all the occurrences of a pattern P in a text T , when an unrestricted number of disjoint local swaps is allowed. In this paper, we present a new, efficient method for the Swap Matching problem with short patterns. In particular, we present an algorithm constructing a non-dete...

متن کامل

An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm’s running time cost (such as the number of text character accesses) for any given patte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997